Installation


In [1]:
import arcpy as ARCPY
import arcgisscripting as ARC
import SSDataObject as SSDO
import SSUtilities as UTILS
import WeightsUtilities as WU
import numpy as NUM
import scipy as SCIPY
import pysal as PYSAL
import os as OS
import pandas as PANDAS

Example: Testing the Income Convergence Hypothesis in California Counties (1969 - 2010)

  • Use the Auto-Model Spatial Econometric Tool to identify the appropriate model

  • Regressing the growth rate of incomes on the log of starting incomes

    • a significant negative coefficient indicates convergence
  • The percentage of the population w/o a high school education and the population itself are the other exogenous factors.

Importing Your Data into a PANDAS DataFrame


In [2]:
inputFC = r'../data/CA_Polygons.shp'
fullFC = OS.path.abspath(inputFC)
fullPath, fcName = OS.path.split(fullFC)
ssdo = SSDO.SSDataObject(inputFC)
uniqueIDField = "MYID"
fieldNames = ['GROWTH', 'LOGPCR69', 'PERCNOHS', 'POP1969']
ssdo.obtainData(uniqueIDField, fieldNames)
df = ssdo.getDataFrame()
print(df.head())


       GROWTH  LOGPCR69  PERCNOHS  POP1969
158  0.011426  0.176233      37.0  1060099
159 -0.137376  0.214186      38.3      398
160 -0.188417  0.067722      41.4    11240
161 -0.085070 -0.118248      42.9   101057
162 -0.049022 -0.081377      48.1    13328

Use the PySAL-ArcGIS Utilities to Read in Spatial Weights Files


In [3]:
import pysal2ArcUtils as PYSAL_UTILS
swmFile = OS.path.join(fullPath, "queen.swm")
W = PYSAL_UTILS.PAT_W(ssdo, swmFile)
w = W.w

kernelSWMFile = OS.path.join(fullPath, "knn8.swm")
KW = PYSAL_UTILS.PAT_W(ssdo, kernelSWMFile)
kw = KW.w

Run the Auto Model Class and Export Your Data to an Output Feature Class


In [5]:
import AutoModel as AUTO
auto = AUTO.AutoSpace_PySAL(ssdo, "GROWTH", ['LOGPCR69', 'PERCNOHS', 'POP1969'],
                            W, KW, pValue = 0.1, useCombo = True)
ARCPY.env.overwriteOutput = True
outputFC = r'../data/pysal_automodel.shp'
auto.createOutput(outputFC)

Compare OLS and Spatial Lag Results


In [7]:
print(auto.olsModel.summary)


REGRESSION
----------
SUMMARY OF OUTPUT: ORDINARY LEAST SQUARES
-----------------------------------------
Data set            :../data/CA_Polygons.shp
Weights matrix      :   queen.swm
Dependent Variable  :      GROWTH                Number of Observations:          58
Mean dependent var  :     -0.1152                Number of Variables   :           4
S.D. dependent var  :      0.1641                Degrees of Freedom    :          54
R-squared           :      0.5537
Adjusted R-squared  :      0.5290
Sum squared residual:       0.685                F-statistic           :     22.3358
Sigma-square        :       0.013                Prob(F-statistic)     :   1.551e-09
S.E. of regression  :       0.113                Log likelihood        :      46.429
Sigma-square ML     :       0.012                Akaike info criterion :     -84.858
S.E of regression ML:      0.1087                Schwarz criterion     :     -76.616

------------------------------------------------------------------------------------
            Variable     Coefficient       Std.Error     t-Statistic     Probability
------------------------------------------------------------------------------------
            CONSTANT       0.5972912       0.1097673       5.4414326       0.0000013
            LOGPCR69      -0.0390200       0.1358352      -0.2872601       0.7750127
            PERCNOHS      -0.0170809       0.0025175      -6.7848679       0.0000000
             POP1969      -0.0000000       0.0000000      -0.7126791       0.4791127
------------------------------------------------------------------------------------

REGRESSION DIAGNOSTICS
MULTICOLLINEARITY CONDITION NUMBER           15.894

TEST ON NORMALITY OF ERRORS
TEST                             DF        VALUE           PROB
Jarque-Bera                       2           1.181           0.5541

DIAGNOSTICS FOR HETEROSKEDASTICITY
RANDOM COEFFICIENTS
TEST                             DF        VALUE           PROB
Breusch-Pagan test                3           1.329           0.7222
Koenker-Bassett test              3           1.999           0.5725

DIAGNOSTICS FOR SPATIAL DEPENDENCE
TEST                           MI/DF       VALUE           PROB
Lagrange Multiplier (lag)         1           5.330           0.0210
Robust LM (lag)                   1           5.977           0.0145
Lagrange Multiplier (error)       1           1.336           0.2477
Robust LM (error)                 1           1.983           0.1591
Lagrange Multiplier (SARMA)       2           7.313           0.0258

================================ END OF REPORT =====================================

In [9]:
print(auto.finalModel.summary)


REGRESSION
----------
SUMMARY OF OUTPUT: SPATIAL TWO STAGE LEAST SQUARES
--------------------------------------------------
Data set            :../data/CA_Polygons.shp
Weights matrix      :   queen.swm
Dependent Variable  :      GROWTH                Number of Observations:          58
Mean dependent var  :     -0.1152                Number of Variables   :           5
S.D. dependent var  :      0.1641                Degrees of Freedom    :          53
Pseudo R-squared    :      0.6169
Spatial Pseudo R-squared:  0.5131

------------------------------------------------------------------------------------
            Variable     Coefficient       Std.Error     z-Statistic     Probability
------------------------------------------------------------------------------------
            CONSTANT       0.6611717       0.1005943       6.5726567       0.0000000
            LOGPCR69      -0.2400177       0.1394294      -1.7214277       0.0851732
            PERCNOHS      -0.0161070       0.0022769      -7.0739777       0.0000000
             POP1969      -0.0000000       0.0000000      -0.4311762       0.6663403
            W_GROWTH       0.7523195       0.2556755       2.9424783       0.0032560
------------------------------------------------------------------------------------
Instrumented: W_GROWTH
Instruments: W_LOGPCR69, W_PERCNOHS, W_POP1969
================================ END OF REPORT =====================================

Interpreting the Results

  • While the coefficient for the log of starting incomes (LOGPCR69 (-.039)) was negative in the OLS model, it was not statistically significant [p-value = .775].
  • The negative coefficient for LOGPCR69 (-.240) was statistically significant in the Spatial Lag Model at the 90% Confidence Level. This provides evidence to bolster the regional convergence hypothesis in the California Counties from 1969 to 2010.
  • The overall level of population of population in 1969 (POP1969) did not appear to contribute to the growth rate of regional incomes in California over the time period as their respective coefficients were insignificant in each model [p-values = .479, .666].
  • The percentage of the population with no high school education (PERCNOHS) appears to be a strong indicator for regional income growth. The statistically significant coefficients in both models demonstrate that there is a negative relationship between this metric for human capital and growth rates [p-values = <.0000].
  • The positive and statistically significant coefficent for the spatial lag variable (W_GROWTH, .752, p-value = .003) indicates that there are considerable spillover effects among growth rates in the counties. Locations with higher growth rates tend to be nearer to others with higher growth rates and vice-a-versa.

In [ ]: